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Abstract 

The field of genomics is often cited as the branch of biology that has led the way in data sharing. In most cases, 
sequencing data are made publicly available immediately after generation and often before the data generators 
have completed their analyses. Although the pros of such openness cannot be denied, problems can arise when 
unpublished genomic data are shared. In this editorial we touch on these issues and discuss the roles and 
responsibilities of the data generators, data users and journal editors. 



The past decade has seen big changes in the field of gen- 
omics, not only in terms of advances in technology, but 
also with regard to the views on sharing the data gener- 
ated [1,2]. Open data has become the buzzword of this 
age. No one can deny that this openness and willingness 
to share genomic data, both published and (perhaps 
more importantly) unpublished, has resulted in remar- 
kable progress. However, when it comes to unpublished 
genomic data, this openness can also leave the data gen- 
erators vulnerable. The community needs to balance the 
benefits of data sharing against the interests of the data 
owners, and usually the process works well. 

The genomics community has measures in place to 
protect the data owners-data are often released under 
embargoes (of varying lengths, but usually not longer 
than 24 months) and data owners can also publish a 
'statement of intent! i.e. outline the specific analyses they 
plan to undertake, when they release the data. There are 
also community norms-specifically the Bermuda rules 
[3], and the Fort Lauderdale [4] and Toronto [5] agree- 
ments-to help researchers navigate this rather sensitive 
issue. However, embargoes are not indefinite and neither 
does it seem fair to indefinitely prohibit specific analyses. 
It is also worth clarifying that the agreements mentioned 
above are so-called gentlemen's agreements, they are not 
law, and their utility depends on goodwill and communi- 
cation within the community, not unlike attribution and 
the way scientists use citation to give credit. 

The key words, as we see it, are community and com- 
munication. The researchers in the field are essentially 
in the same boat-they could be the data generators in 
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one case and data users in another. Without communi- 
cation the boat is likely to capsize. The data generators 
need to be clear in their intentions and in specifying any 
conditions that the data are released under, and the data 
users need to inform the data generators and seek 
permission to use the data if appropriate. Perhaps also 
there is a need to have enforceable guidelines in place 
rather than relying on gentlemen's agreements? The 
US National Institutes of Health (NIH) have already 
taken a step in this direction and have recently released 
a draft policy on the sharing of genomic data [6], which, 
if approved, will be applicable to all researchers who re- 
ceive NIH funding. The guidelines cover, amongst other 
topics, the issue of when to release data; for raw se- 
quence data from non-human organisms, the specified 
deadline is within 6 months of submission to an ap- 
proved data repository. 

A question that follows is-whose responsibility is it to 
ensure that appropriate permission has been acquired to 
include the analysis of unpublished genomic data in a 
manuscript? Does the responsibility lie with the authors 
or the reviewers or with the journal editors? In our ex- 
perience, such issues have usually been brought to light 
during the review process, but given the extensive 
amounts of data being generated, neither reviewers nor 
editors can be expected to be aware of the requirements 
for the use of each and every genome sequence. BMC 
Genomics has recently published a study by Zhao et al. 
[7], including an analysis of 103 fungal genomes. After 
publication it became apparent that some of these ge- 
nomes were unpublished, and the authors had not in- 
formed the data owners of their intent of publishing an 
analysis of these genomes. Given this situation, we and 
the authors, in consultation with the data owners, 
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agreed that a correction [8], whereby the authors would 
remove specific genomes from the analysis, was the ap- 
propriate way to proceed. In fact, only two of the dis- 
puted genomes were specifically under embargo, but 
after discussion with the data generators the authors 
agreed to remove from the analysis not only the embar- 
goed genomes, but also an additional seven yet unpub- 
lished genomes. 

Data generators, data users and journal editors all have 
a role to play in ensuring that the interests of all in- 
volved parties are protected, and as we have mentioned, 
the key to this is communication. We feel the ultimate 
responsibility should lie with the data user; it is up to 
them to ensure that they are aware of (and adhere to) 
any conditions set by the data generators. The latter 
could also make it easier for the data users by ensuring 
that the necessary information is readily available. 

This is not to say that a journal has no responsibility 
however; a journal can increase awareness of the 
requirements in a field by incorporating guidance 
into their policies or instructions for authors. BioMed 
Central's editorial policies [9] now include a section 
on the use of unpublished genomic data: "Authors using 
unpublished genomic data are expected to abide by the 
guidelines of the Fort Lauderdale and Toronto agree- 
ments. Based on broadly accepted scientific community 
standards, the key requirement for the third parties 
using genomic data is to contact the owners of unpub- 
lished data (i.e., the principal investigator and sequen- 
cing center) prior to undertaking their research, to 
advise them about their planned analyses." A journal is 
also, of course, responsible for taking the appropriate 
action when problems such as those exemplified by 
this case arise. Additionally, journal editors can facili- 
tate communication between the concerned parties 
and help them arrive at a mutually satisfactory solu- 
tion. Finally, a journal can instigate discussion on a 
topic or issue by bringing them to light-as we are doing 
by publishing this editorial. 
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